13 research outputs found

    Periodic Motion Detection and Estimation via Space-Time Sampling

    Full text link
    A novel technique to detect and localize periodic movements in video is presented. The distinctive feature of the technique is that it requires neither feature tracking nor object segmentation. Intensity patterns along linear sample paths in space-time are used in estimation of period of object motion in a given sequence of frames. Sample paths are obtained by connecting (in space-time) sample points from regions of high motion magnitude in the first and last frames. Oscillations in intensity values are induced at time instants when an object intersects the sample path. The locations of peaks in intensity are determined by parameters of both cyclic object motion and orientation of the sample path with respect to object motion. The information about peaks is used in a least squares framework to obtain an initial estimate of these parameters. The estimate is further refined using the full intensity profile. The best estimate for the period of cyclic object motion is obtained by looking for consensus among estimates from many sample paths. The proposed technique is evaluated with synthetic videos where ground-truth is known, and with American Sign Language videos where the goal is to detect periodic hand motions.National Science Foundation (CNS-0202067, IIS-0308213, IIS-0329009); Office of Naval Research (N00014-03-1-0108

    Exploiting phonological constraints for handshape recognition in sign language video

    Full text link
    The ability to recognize handshapes in signing video is essential in algorithms for sign recognition and retrieval. Handshape recognition from isolated images is, however, an insufficiently constrained problem. Many handshapes share similar 3D configurations and are indistinguishable for some hand orientations in 2D image projections. Additionally, significant differences in handshape appearance are induced by the articulated structure of the hand and variants produced by different signers. Linguistic rules involved in the production of signs impose strong constraints on the articulations of the hands, yet, little attention has been paid towards exploiting these constraints in previous works on sign recognition. Among the different classes of signs in any signed language, lexical signs constitute the prevalent class. Morphemes (or, meaningful units) for signs in this class involve a combination of particular handshapes, palm orientations, locations for articulation, and movement type. These are thus analyzed by many sign linguists as analogues of phonemes in spoken languages. Phonological constraints govern the ways in which phonemes combine in American Sign Language (ASL), as in other signed and spoken languages; utilizing these constraints for handshape recognition in ASL is the focus of the proposed thesis. Handshapes in monomorphemic lexical signs are specified at the start and end of the sign. The handshape transition within a sign are constrained to involve either closing or opening of the hand (i.e., constrained to exclusively use either folding or unfolding of the palm and one or more fingers). Furthermore, akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, we are collecting and preparing for dissemination a large corpus (three thousand signs from three native signers) of ASL video annotated with linguistic information such as glosses, morphological properties and variations, and start/end handshapes associated with each ASL sign

    Challenges in development of the American Sign Language Lexicon Video Dataset (ASLLVD) corpus

    Full text link
    The American Sign Language Lexicon Video Dataset (ASLLVD) consists of videos of >3,300 ASL signs in citation form, each produced by 1-6 native ASL signers, for a total of almost 9,800 tokens. This dataset, including multiple synchronized videos showing the signing from different angles, will be shared publicly once the linguistic annotations and verifications are complete. Linguistic annotations include gloss labels, sign start and end time codes, start and end handshape labels for both hands, morphological and articulatory classifications of sign type. For compound signs, the dataset includes annotations for each morpheme. To facilitate computer vision-based sign language recognition, the dataset also includes numeric ID labels for sign variants, video sequences in uncompressed-raw format, camera calibration sequences, and software for skin region extraction. We discuss here some of the challenges involved in the linguistic annotations and categorizations. We also report an example computer vision application that leverages the ASLLVD: the formulation employs a HandShapes Bayesian Network (HSBN), which models the transition probabilities between start and end handshapes in monomorphemic lexical signs. Further details and statistics for the ASLLVD dataset, as well as information about annotation conventions, are available from http://www.bu.edu/asllrp/lexicon

    Face Identification by a Cascade of Rejection Classifiers

    Full text link
    Nearest neighbor search is commonly employed in face recognition but it does not scale well to large dataset sizes. A strategy to combine rejection classifiers into a cascade for face identification is proposed in this paper. A rejection classifier for a pair of classes is defined to reject at least one of the classes with high confidence. These rejection classifiers are able to share discriminants in feature space and at the same time have high confidence in the rejection decision. In the face identification problem, it is possible that a pair of known individual faces are very dissimilar. It is very unlikely that both of them are close to an unknown face in the feature space. Hence, only one of them needs to be considered. Using a cascade structure of rejection classifiers, the scope of nearest neighbor search can be reduced significantly. Experiments on Face Recognition Grand Challenge (FRGC) version 1 data demonstrate that the proposed method achieves significant speed up and an accuracy comparable with the brute force Nearest Neighbor method. In addition, a graph cut based clustering technique is employed to demonstrate that the pairwise separability of these rejection classifiers is capable of semantic grouping.National Science Foundation (EIA-0202067, IIS-0329009); Office of Naval Research (N00014-03-1-0108

    Exploiting Phonological Constraints for Handshape Inference in ASL Video

    Full text link
    Handshape is a key articulatory parameter in sign language, and thus handshape recognition from signing video is essential for sign recognition and retrieval. Handshape transitions within monomorphemic lexical signs (the largest class of signs in signed languages) are governed by phonological rules. For example, such transitions normally involve either closing or opening of the hand (i.e., to exclusively use either folding or unfolding of the palm and one or more fingers). Furthermore, akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints, also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, we are collecting and preparing for dissemination a large corpus (three thousand signs from three native signers) of American Sign Language (ASL) video. The video have been annotated using SignStream® [Neidle et al.] with labels for linguistic information such as glosses, morphological properties and variations, and start/end handshapes associated with each ASL sign.National Science Foundation grants 0705749 and 085506

    Layered graphical models for tracking partially-occluded objects

    No full text
    Partial occlusions are commonplace in a variety of real world computer vision applications: surveillance, intelligent environments, assistive robotics, autonomous navigation, etc. While occlusion handling methods have been proposed, most methods tend to break down when confronted with numerous occluders in a scene. In this paper, a layered image-plane representation for tracking people through substantial occlusions is proposed. An imageplane representation of motion around an object is associated with a pre-computed graphical model, which can be instantiated efficiently during online tracking. A global state and observation space is obtained by linking transitions between layers. A Reversible Jump Markov Chain Monte Carlo approach is used to infer the number of people and track them online. The method outperforms two stateof-the-art methods for tracking over extended occlusions, given videos of a parking lot with numerous vehicles and a laboratory with many desks and workstations. 1

    Coupling detection and data association for multiple object tracking

    No full text
    We present a novel framework for multiple object tracking in which the problems of object detection and data association are expressed by a single objective function. The framework follows the Lagrange dual decomposition strategy, taking advantage of the often complementary nature of the two subproblems. Our coupling formulation avoids the problem of error propagation from which traditional “detection-tracking approaches ” to multiple object tracking suffer. We also eschew common heuristics such as “nonmaximum suppression ” of hypotheses by modeling the joint image likelihood as opposed to applying independent likelihood assumptions. Our coupling algorithm is guaranteed to converge and can handle partial or even complete occlusions. Furthermore, our method does not have any severe scalability issues but can process hundreds of frames at the same time. Our experiments involve challenging, notably distinct datasets and demonstrate that our method can achieve results comparable to those of state-of-art approaches, even without a heavily trained object detector. 1

    Parameter Sensitive Detectors

    No full text
    Object detection can be challenging when the object class exhibits large variations. One commonly-used strategy is to first partition the space of possible object variations and then train separate classifiers for each portion. However, with continuous spaces the partitions tend to be arbitrary since there are no natural boundaries (for example, consider the continuous range of human body poses). In this paper, a new formulation is proposed, where the detectors themselves are associated with continuous parameters, and reside in a parameterized function space. There are two advantages of this strategy. First, a-priori partitioning of the parameter space is not needed; the detectors themselves are in a parameterized space. Second, the underlying parameters for object variations can be learned from training data in an unsupervised manner. In profile face detection experiments, at a fixed false alarm number of 90, our method attains a detection rate of 75 % vs. 70% for the method of Viola-Jones. In hand shape detection, at a false positive rate of 0.1%, our method achieves a detection rate of 99.5 % vs. 98 % for partition based methods. In pedestrian detection, our method reduces the miss detection rate by a factor of three at a false positive rate of 1%, compared with the method of Dalal-Triggs. 1

    Multiplicative Kernels: Object Detection, Segmentation and Pose Estimation

    No full text
    Object detection is challenging when the object class exhibits large within-class variations. In this work, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly learned in a multiplicative form of two kernel functions. One kernel measures similarity for foreground-background classification. The other kernel accounts for latent factors that control within-class variation and implicitly enables feature sharing among foreground training samples. Detector training can be accomplished via standard SVM learning. The resulting detectors are tuned to specific variations in the foreground class. They also serve to evaluate hypotheses of the foreground state. When the foreground parameters are provided in training, the detectors can also produce parameter estimate. When the foreground object masks are provided in training, the detectors can also produce object segmentation. The advantages of our method over past methods are demonstrated on data sets of human hands and vehicles. 1
    corecore